{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training at scale with AI Platform Training Service\n",
    "**Learning Objectives:**\n",
    "  1. Learn how to organize your training code into a Python package\n",
    "  1. Train your model using cloud infrastructure via Google Cloud AI Platform Training Service\n",
    "  1. (optional) Learn how to run your training package using Docker containers and push training Docker images on a Docker registry\n",
    "\n",
    "## Introduction\n",
    "\n",
    "In this notebook we'll make the jump from training locally, to do training in the cloud. We'll take advantage of Google Cloud's [AI Platform Training Service](https://cloud.google.com/ai-platform/). \n",
    "\n",
    "AI Platform Training Service is a managed service that allows the training and deployment of ML models without having to provision or maintain servers. The infrastructure is handled seamlessly by the managed service for us.\n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [Solution Notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive2/building_production_ml_systems/solutions/1_training_at_scale.ipynb) for reference. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specify your project name and bucket name in the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from google.cloud import bigquery"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Change the following cell as necessary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change with your own bucket and project below:\n",
    "BUCKET =  \"<BUCKET>\"\n",
    "PROJECT = \"<PROJECT>\"\n",
    "REGION = \"<YOUR REGION>\"\n",
    "\n",
    "OUTDIR = \"gs://{bucket}/taxifare/data\".format(bucket=BUCKET)\n",
    "\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['OUTDIR'] = OUTDIR\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['REGION'] = REGION\n",
    "os.environ['TFVERSION'] = \"2.1\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Confirm below that the bucket is regional and its region equals to the specified region:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud storage ls --full --buckets gs://$BUCKET | grep \"gs://\\|Location\"\n",    "echo $REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT\n",
    "gcloud config set compute/region $REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create BigQuery tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have not already created a BigQuery dataset for our data, run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bq = bigquery.Client(project = PROJECT)\n",
    "dataset = bigquery.Dataset(bq.dataset(\"taxifare\"))\n",
    "\n",
    "try:\n",
    "    bq.create_dataset(dataset)\n",
    "    print(\"Dataset created\")\n",
    "except:\n",
    "    print(\"Dataset already exists\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a table with 1 million examples.\n",
    "\n",
    "Note that the order of columns is exactly what was in our CSV files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "\n",
    "CREATE OR REPLACE TABLE taxifare.feateng_training_data AS\n",
    "\n",
    "SELECT\n",
    "    (tolls_amount + fare_amount) AS fare_amount,\n",
    "    pickup_datetime,\n",
    "    pickup_longitude AS pickuplon,\n",
    "    pickup_latitude AS pickuplat,\n",
    "    dropoff_longitude AS dropofflon,\n",
    "    dropoff_latitude AS dropofflat,\n",
    "    passenger_count*1.0 AS passengers,\n",
    "    'unused' AS key\n",
    "FROM `nyc-tlc.yellow.trips`\n",
    "WHERE ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = 1\n",
    "AND\n",
    "    trip_distance > 0\n",
    "    AND fare_amount >= 2.5\n",
    "    AND pickup_longitude > -78\n",
    "    AND pickup_longitude < -70\n",
    "    AND dropoff_longitude > -78\n",
    "    AND dropoff_longitude < -70\n",
    "    AND pickup_latitude > 37\n",
    "    AND pickup_latitude < 45\n",
    "    AND dropoff_latitude > 37\n",
    "    AND dropoff_latitude < 45\n",
    "    AND passenger_count > 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make the validation dataset be 1/10 the size of the training dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "\n",
    "CREATE OR REPLACE TABLE taxifare.feateng_valid_data AS\n",
    "\n",
    "SELECT\n",
    "    (tolls_amount + fare_amount) AS fare_amount,\n",
    "    pickup_datetime,\n",
    "    pickup_longitude AS pickuplon,\n",
    "    pickup_latitude AS pickuplat,\n",
    "    dropoff_longitude AS dropofflon,\n",
    "    dropoff_latitude AS dropofflat,\n",
    "    passenger_count*1.0 AS passengers,\n",
    "    'unused' AS key\n",
    "FROM `nyc-tlc.yellow.trips`\n",
    "WHERE ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 10000)) = 2\n",
    "AND\n",
    "    trip_distance > 0\n",
    "    AND fare_amount >= 2.5\n",
    "    AND pickup_longitude > -78\n",
    "    AND pickup_longitude < -70\n",
    "    AND dropoff_longitude > -78\n",
    "    AND dropoff_longitude < -70\n",
    "    AND pickup_latitude > 37\n",
    "    AND pickup_latitude < 45\n",
    "    AND dropoff_latitude > 37\n",
    "    AND dropoff_latitude < 45\n",
    "    AND passenger_count > 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export the tables as CSV files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "echo \"Deleting current contents of $OUTDIR\"\n",
    "gcloud storage rm --quiet --recursive --continue-on-error $OUTDIR\n",    "\n",
    "echo \"Extracting training data to $OUTDIR\"\n",
    "bq --location=US extract \\\n",
    "   --destination_format CSV  \\\n",
    "   --field_delimiter \",\" --noprint_header \\\n",
    "   taxifare.feateng_training_data \\\n",
    "   $OUTDIR/taxi-train-*.csv\n",
    "\n",
    "echo \"Extracting validation data to $OUTDIR\"\n",
    "bq --location=US extract \\\n",
    "   --destination_format CSV  \\\n",
    "   --field_delimiter \",\" --noprint_header \\\n",
    "   taxifare.feateng_valid_data \\\n",
    "   $OUTDIR/taxi-valid-*.csv\n",
    "\n",
    "gcloud storage ls --long $OUTDIR"   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud storage cat gs://$BUCKET/taxifare/data/taxi-train-000000000000.csv | head -2"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make code compatible with AI Platform Training Service\n",
    "In order to make our code compatible with AI Platform Training Service we need to make the following changes:\n",
    "\n",
    "1. Upload data to Google Cloud Storage \n",
    "2. Move code into a trainer Python package\n",
    "4. Submit training job with `gcloud` to train on AI Platform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload data to Google Cloud Storage (GCS)\n",
    "\n",
    "Cloud services don't have access to our local files, so we need to upload them to a location the Cloud servers can read from. In this case we'll use GCS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud storage ls gs://$BUCKET/taxifare/data"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Move code into a python package\n",
    "\n",
    "\n",
    "The first thing to do is to convert your training code snippets into a regular Python package that we will then `pip install` into the Docker container. \n",
    "\n",
    "A Python package is simply a collection of one or more `.py` files along with an `__init__.py` file to identify the containing directory as a package. The `__init__.py` sometimes contains initialization code but for our purposes an empty file suffices."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create the package directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our package directory contains 3 files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ls ./taxifare/trainer/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Paste existing code into model.py\n",
    "\n",
    "A Python package requires our code to be in a .py file, as opposed to notebook cells. So, we simply copy and paste our existing code for the previous notebook into a single file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the cell below, we write the contents of the cell into `model.py` packaging the model we \n",
    "developed in the previous labs so that we can deploy it to AI Platform Training Service.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Lab Task #1**: Organizing your training code into a Python package\n",
    "\n",
    "There are two places to fill in TODOs in `model.py`. \n",
    "\n",
    " * in the `build_dnn_model` function, add code to use a optimizer with a custom learning rate.\n",
    " * in the `train_and_evaluate` function, add code to define variables using the `hparams` dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./taxifare/trainer/model.py\n",
    "import datetime\n",
    "import logging\n",
    "import os\n",
    "import shutil\n",
    "\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "from tensorflow.keras import activations\n",
    "from tensorflow.keras import callbacks\n",
    "from tensorflow.keras import layers\n",
    "from tensorflow.keras import models\n",
    "\n",
    "from tensorflow import feature_column as fc\n",
    "\n",
    "logging.info(tf.version.VERSION)\n",
    "\n",
    "\n",
    "CSV_COLUMNS = [\n",
    "        'fare_amount',\n",
    "        'pickup_datetime',\n",
    "        'pickup_longitude',\n",
    "        'pickup_latitude',\n",
    "        'dropoff_longitude',\n",
    "        'dropoff_latitude',\n",
    "        'passenger_count',\n",
    "        'key',\n",
    "]\n",
    "LABEL_COLUMN = 'fare_amount'\n",
    "DEFAULTS = [[0.0], ['na'], [0.0], [0.0], [0.0], [0.0], [0.0], ['na']]\n",
    "DAYS = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']\n",
    "\n",
    "\n",
    "def features_and_labels(row_data):\n",
    "    for unwanted_col in ['key']:\n",
    "        row_data.pop(unwanted_col)\n",
    "    label = row_data.pop(LABEL_COLUMN)\n",
    "    return row_data, label\n",
    "\n",
    "\n",
    "def load_dataset(pattern, batch_size, num_repeat):\n",
    "    dataset = tf.data.experimental.make_csv_dataset(\n",
    "        file_pattern=pattern,\n",
    "        batch_size=batch_size,\n",
    "        column_names=CSV_COLUMNS,\n",
    "        column_defaults=DEFAULTS,\n",
    "        num_epochs=num_repeat,\n",
    "    )\n",
    "    return dataset.map(features_and_labels)\n",
    "\n",
    "\n",
    "def create_train_dataset(pattern, batch_size):\n",
    "    dataset = load_dataset(pattern, batch_size, num_repeat=None)\n",
    "    return dataset.prefetch(1)\n",
    "\n",
    "\n",
    "def create_eval_dataset(pattern, batch_size):\n",
    "    dataset = load_dataset(pattern, batch_size, num_repeat=1)\n",
    "    return dataset.prefetch(1)\n",
    "\n",
    "\n",
    "def parse_datetime(s):\n",
    "    if type(s) is not str:\n",
    "        s = s.numpy().decode('utf-8')\n",
    "    return datetime.datetime.strptime(s, \"%Y-%m-%d %H:%M:%S %Z\")\n",
    "\n",
    "\n",
    "def euclidean(params):\n",
    "    lon1, lat1, lon2, lat2 = params\n",
    "    londiff = lon2 - lon1\n",
    "    latdiff = lat2 - lat1\n",
    "    return tf.sqrt(londiff*londiff + latdiff*latdiff)\n",
    "\n",
    "\n",
    "def get_dayofweek(s):\n",
    "    ts = parse_datetime(s)\n",
    "    return DAYS[ts.weekday()]\n",
    "\n",
    "\n",
    "@tf.function\n",
    "def dayofweek(ts_in):\n",
    "    return tf.map_fn(\n",
    "        lambda s: tf.py_function(get_dayofweek, inp=[s], Tout=tf.string),\n",
    "        ts_in\n",
    "    )\n",
    "\n",
    "\n",
    "@tf.function\n",
    "def fare_thresh(x):\n",
    "    return 60 * activations.relu(x)\n",
    "\n",
    "\n",
    "def transform(inputs, NUMERIC_COLS, STRING_COLS, nbuckets):\n",
    "    # Pass-through columns\n",
    "    transformed = inputs.copy()\n",
    "    del transformed['pickup_datetime']\n",
    "\n",
    "    feature_columns = {\n",
    "        colname: fc.numeric_column(colname)\n",
    "        for colname in NUMERIC_COLS\n",
    "    }\n",
    "\n",
    "    # Scaling longitude from range [-70, -78] to [0, 1]\n",
    "    for lon_col in ['pickup_longitude', 'dropoff_longitude']:\n",
    "        transformed[lon_col] = layers.Lambda(\n",
    "            lambda x: (x + 78)/8.0,\n",
    "            name='scale_{}'.format(lon_col)\n",
    "        )(inputs[lon_col])\n",
    "\n",
    "    # Scaling latitude from range [37, 45] to [0, 1]\n",
    "    for lat_col in ['pickup_latitude', 'dropoff_latitude']:\n",
    "        transformed[lat_col] = layers.Lambda(\n",
    "            lambda x: (x - 37)/8.0,\n",
    "            name='scale_{}'.format(lat_col)\n",
    "        )(inputs[lat_col])\n",
    "\n",
    "    # Adding Euclidean dist (no need to be accurate: NN will calibrate it)\n",
    "    transformed['euclidean'] = layers.Lambda(euclidean, name='euclidean')([\n",
    "        inputs['pickup_longitude'],\n",
    "        inputs['pickup_latitude'],\n",
    "        inputs['dropoff_longitude'],\n",
    "        inputs['dropoff_latitude']\n",
    "    ])\n",
    "    feature_columns['euclidean'] = fc.numeric_column('euclidean')\n",
    "\n",
    "    # hour of day from timestamp of form '2010-02-08 09:17:00+00:00'\n",
    "    transformed['hourofday'] = layers.Lambda(\n",
    "        lambda x: tf.strings.to_number(\n",
    "            tf.strings.substr(x, 11, 2), out_type=tf.dtypes.int32),\n",
    "        name='hourofday'\n",
    "    )(inputs['pickup_datetime'])\n",
    "    feature_columns['hourofday'] = fc.indicator_column(\n",
    "        fc.categorical_column_with_identity(\n",
    "            'hourofday', num_buckets=24))\n",
    "\n",
    "    latbuckets = np.linspace(0, 1, nbuckets).tolist()\n",
    "    lonbuckets = np.linspace(0, 1, nbuckets).tolist()\n",
    "    b_plat = fc.bucketized_column(\n",
    "        feature_columns['pickup_latitude'], latbuckets)\n",
    "    b_dlat = fc.bucketized_column(\n",
    "            feature_columns['dropoff_latitude'], latbuckets)\n",
    "    b_plon = fc.bucketized_column(\n",
    "            feature_columns['pickup_longitude'], lonbuckets)\n",
    "    b_dlon = fc.bucketized_column(\n",
    "            feature_columns['dropoff_longitude'], lonbuckets)\n",
    "    ploc = fc.crossed_column(\n",
    "            [b_plat, b_plon], nbuckets * nbuckets)\n",
    "    dloc = fc.crossed_column(\n",
    "            [b_dlat, b_dlon], nbuckets * nbuckets)\n",
    "    pd_pair = fc.crossed_column([ploc, dloc], nbuckets ** 4)\n",
    "    feature_columns['pickup_and_dropoff'] = fc.embedding_column(\n",
    "            pd_pair, 100)\n",
    "\n",
    "    return transformed, feature_columns\n",
    "\n",
    "\n",
    "def rmse(y_true, y_pred):\n",
    "    return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true)))\n",
    "\n",
    "\n",
    "def build_dnn_model(nbuckets, nnsize, lr):\n",
    "    # input layer is all float except for pickup_datetime which is a string\n",
    "    STRING_COLS = ['pickup_datetime']\n",
    "    NUMERIC_COLS = (\n",
    "            set(CSV_COLUMNS) - set([LABEL_COLUMN, 'key']) - set(STRING_COLS)\n",
    "    )\n",
    "    inputs = {\n",
    "        colname: layers.Input(name=colname, shape=(), dtype='float32')\n",
    "        for colname in NUMERIC_COLS\n",
    "    }\n",
    "    inputs.update({\n",
    "        colname: layers.Input(name=colname, shape=(), dtype='string')\n",
    "        for colname in STRING_COLS\n",
    "    })\n",
    "\n",
    "    # transforms\n",
    "    transformed, feature_columns = transform(\n",
    "        inputs, NUMERIC_COLS, STRING_COLS, nbuckets=nbuckets)\n",
    "    dnn_inputs = layers.DenseFeatures(feature_columns.values())(transformed)\n",
    "\n",
    "    x = dnn_inputs\n",
    "    for layer, nodes in enumerate(nnsize):\n",
    "        x = layers.Dense(nodes, activation='relu', name='h{}'.format(layer))(x)\n",
    "    output = layers.Dense(1, name='fare')(x)\n",
    "\n",
    "    model = models.Model(inputs, output)\n",
    "    \n",
    "# TODO 1a: Your code here\n",
    "    \n",
    "    return model\n",
    "\n",
    "\n",
    "def train_and_evaluate(hparams):\n",
    "# TODO 1b: Your code here\n",
    "    nnsize = hparams['nnsize']\n",
    "    eval_data_path = hparams['eval_data_path']\n",
    "    num_evals = hparams['num_evals']\n",
    "    num_examples_to_train_on = hparams['num_examples_to_train_on']\n",
    "    output_dir = hparams['output_dir']\n",
    "    train_data_path = hparams['train_data_path']\n",
    "\n",
    "    timestamp = datetime.datetime.now().strftime('%Y%m%d%H%M%S')\n",
    "    savedmodel_dir = os.path.join(output_dir, 'export/savedmodel')\n",
    "    model_export_path = os.path.join(savedmodel_dir, timestamp)\n",
    "    checkpoint_path = os.path.join(output_dir, 'checkpoints')\n",
    "    tensorboard_path = os.path.join(output_dir, 'tensorboard')\n",
    "\n",
    "    if tf.io.gfile.exists(output_dir):\n",
    "        tf.io.gfile.rmtree(output_dir)\n",
    "\n",
    "    model = build_dnn_model(nbuckets, nnsize, lr)\n",
    "    logging.info(model.summary())\n",
    "\n",
    "    trainds = create_train_dataset(train_data_path, batch_size)\n",
    "    evalds = create_eval_dataset(eval_data_path, batch_size)\n",
    "\n",
    "    steps_per_epoch = num_examples_to_train_on // (batch_size * num_evals)\n",
    "\n",
    "    checkpoint_cb = callbacks.ModelCheckpoint(\n",
    "        checkpoint_path,\n",
    "        save_weights_only=True,\n",
    "        verbose=1\n",
    "    )\n",
    "    tensorboard_cb = callbacks.TensorBoard(tensorboard_path)\n",
    "\n",
    "    history = model.fit(\n",
    "        trainds,\n",
    "        validation_data=evalds,\n",
    "        epochs=num_evals,\n",
    "        steps_per_epoch=max(1, steps_per_epoch),\n",
    "        verbose=2,  # 0=silent, 1=progress bar, 2=one line per epoch\n",
    "        callbacks=[checkpoint_cb, tensorboard_cb]\n",
    "    )\n",
    "\n",
    "    # Exporting the model with default serving function.\n",
    "    tf.saved_model.save(model, model_export_path)\n",
    "    return history\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modify code to read data from and write checkpoint files to GCS \n",
    "\n",
    "If you look closely above, you'll notice a new function, `train_and_evaluate` that wraps the code that actually trains the model. This allows us to parametrize the training by passing a dictionary of parameters to this function (e.g, `batch_size`, `num_examples_to_train_on`, `train_data_path` etc.)\n",
    "\n",
    "This is useful because the output directory, data paths and number of train steps will be different depending on whether we're training locally or in the cloud. Parametrizing allows us to use the same code for both.\n",
    "\n",
    "We specify these parameters at run time via the command line. Which means we need to add code to parse command line parameters and invoke `train_and_evaluate()` with those params. This is the job of the `task.py` file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile taxifare/trainer/task.py\n",
    "import argparse\n",
    "\n",
    "from trainer import model\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    parser = argparse.ArgumentParser()\n",
    "    parser.add_argument(\n",
    "        \"--batch_size\",\n",
    "        help=\"Batch size for training steps\",\n",
    "        type=int,\n",
    "        default=32\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--eval_data_path\",\n",
    "        help=\"GCS location pattern of eval files\",\n",
    "        required=True\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--nnsize\",\n",
    "        help=\"Hidden layer sizes (provide space-separated sizes)\",\n",
    "        nargs=\"+\",\n",
    "        type=int,\n",
    "        default=[32, 8]\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--nbuckets\",\n",
    "        help=\"Number of buckets to divide lat and lon with\",\n",
    "        type=int,\n",
    "        default=10\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--lr\",\n",
    "        help = \"learning rate for optimizer\",\n",
    "        type = float,\n",
    "        default = 0.001\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--num_evals\",\n",
    "        help=\"Number of times to evaluate model on eval data training.\",\n",
    "        type=int,\n",
    "        default=5\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--num_examples_to_train_on\",\n",
    "        help=\"Number of examples to train on.\",\n",
    "        type=int,\n",
    "        default=100\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--output_dir\",\n",
    "        help=\"GCS location to write checkpoints and export models\",\n",
    "        required=True\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--train_data_path\",\n",
    "        help=\"GCS location pattern of train files containing eval URLs\",\n",
    "        required=True\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--job-dir\",\n",
    "        help=\"this model ignores this field, but it is required by gcloud\",\n",
    "        default=\"junk\"\n",
    "    )\n",
    "    args = parser.parse_args()\n",
    "    hparams = args.__dict__\n",
    "    hparams.pop(\"job-dir\", None)\n",
    "\n",
    "    model.train_and_evaluate(hparams)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run trainer module package locally\n",
    "\n",
    "Now we can test our training code locally as follows using the local test data. We'll run a very small training job over a single file with a small batch size and one eval step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "EVAL_DATA_PATH=./taxifare/tests/data/taxi-valid*\n",
    "TRAIN_DATA_PATH=./taxifare/tests/data/taxi-train*\n",
    "OUTPUT_DIR=./taxifare-model\n",
    "\n",
    "test ${OUTPUT_DIR} && rm -rf ${OUTPUT_DIR}\n",
    "export PYTHONPATH=${PYTHONPATH}:${PWD}/taxifare\n",
    "    \n",
    "python3 -m trainer.task \\\n",
    "--eval_data_path $EVAL_DATA_PATH \\\n",
    "--output_dir $OUTPUT_DIR \\\n",
    "--train_data_path $TRAIN_DATA_PATH \\\n",
    "--batch_size 5 \\\n",
    "--num_examples_to_train_on 100 \\\n",
    "--num_evals 1 \\\n",
    "--nbuckets 10 \\\n",
    "--lr 0.001 \\\n",
    "--nnsize 32 8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run your training package on Cloud AI Platform\n",
    "\n",
    "Once the code works in standalone mode locally, you can run it on Cloud AI Platform. To submit to the Cloud we use [`gcloud ai-platform jobs submit training [jobname]`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training) and simply specify some additional parameters for AI Platform Training Service:\n",
    "- jobid: A unique identifier for the Cloud job. We usually append system time to ensure uniqueness\n",
    "- region: Cloud region to train in. See [here](https://cloud.google.com/ml-engine/docs/tensorflow/regions) for supported AI Platform Training Service regions\n",
    "\n",
    "The arguments before `-- \\` are for AI Platform Training Service.\n",
    "The arguments after `-- \\` are sent to our `task.py`.\n",
    "\n",
    "Because this is on the entire dataset, it will take a while. You can monitor the job from the GCP console in the Cloud AI Platform section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Lab Task #2**: Train your model using cloud infrastructure via Google Cloud AI Platform Training Service\n",
    "Fill in the TODOs in the code below to submit your job for training on Cloud AI Platform. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "# Output directory and jobID\n",
    "OUTDIR=gs://${BUCKET}/taxifare/trained_model_$(date -u +%y%m%d_%H%M%S)\n",
    "JOBID=taxifare_$(date -u +%y%m%d_%H%M%S)\n",
    "echo ${OUTDIR} ${REGION} ${JOBID}\n",
    "gcloud storage rm --recursive --continue-on-error ${OUTDIR}\n",    "\n",
    "# Model and training hyperparameters\n",
    "BATCH_SIZE=50\n",
    "NUM_EXAMPLES_TO_TRAIN_ON=100\n",
    "NUM_EVALS=100\n",
    "NBUCKETS=10\n",
    "LR=0.001\n",
    "NNSIZE=\"32 8\"\n",
    "\n",
    "# GCS paths\n",
    "GCS_PROJECT_PATH=gs://$BUCKET/taxifare\n",
    "DATA_PATH=$GCS_PROJECT_PATH/data\n",
    "TRAIN_DATA_PATH=$DATA_PATH/taxi-train*\n",
    "EVAL_DATA_PATH=$DATA_PATH/taxi-valid*\n",
    "\n",
    "gcloud ai-platform jobs submit training $JOBID \\\n",
    "# TODO 2: Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### (Optional) Run your training package using Docker container\n",
    "\n",
    "AI Platform Training also supports training in custom containers, allowing users to bring their own Docker containers with any pre-installed ML framework or algorithm to run on AI Platform Training. \n",
    "\n",
    "In this last section, we'll see how to submit a Cloud training job using a customized Docker image. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Containerizing our `./taxifare/trainer` package involves 3 steps:\n",
    "\n",
    "* Writing a Dockerfile in `./taxifare`\n",
    "* Building the Docker image\n",
    "* Pushing it to the Google Cloud container registry in our GCP project"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `Dockerfile` specifies\n",
    "1. How the container needs to be provisioned so that all the dependencies in our code are satisfied\n",
    "2. Where to copy our trainer Package in the container and how to install it (`pip install /trainer`)\n",
    "3. What command to run when the container is ran (the `ENTRYPOINT` line)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Lab Task #3**: Running your training package using Docker containers.\n",
    "Fill in the TODOs in the code below for Dockerfile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./taxifare/Dockerfile\n",
    "FROM gcr.io/deeplearning-platform-release/tf2-cpu\n",
    "# TODO 3: Your code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud auth configure-docker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash \n",
    "\n",
    "PROJECT_DIR=$(cd ./taxifare && pwd)\n",
    "PROJECT_ID=$(gcloud config list project --format \"value(core.project)\")\n",
    "IMAGE_NAME=taxifare_training_container\n",
    "DOCKERFILE=$PROJECT_DIR/Dockerfile\n",
    "IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_NAME\n",
    "\n",
    "docker build $PROJECT_DIR -f $DOCKERFILE -t $IMAGE_URI\n",
    "\n",
    "docker push $IMAGE_URI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Remark:** If you prefer to build the container image from the command line, we have written a script for that `./taxifare/scripts/build.sh`. This script reads its configuration from the file `./taxifare/scripts/env.sh`. You can configure these arguments the way you want in that file. You can also simply type `make build` from within `./taxifare` to build the image (which will invoke the build script). Similarly, we wrote the script `./taxifare/scripts/push.sh` to push the Docker image, which you can also trigger by typing `make push` from within `./taxifare`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train using a custom container on AI Platform\n",
    "\n",
    "To submit to the Cloud we use [`gcloud ai-platform jobs submit training [jobname]`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training) and simply specify some additional parameters for AI Platform Training Service:\n",
    "- jobname: A unique identifier for the Cloud job. We usually append system time to ensure uniqueness\n",
    "- master-image-uri: The uri of the Docker image we pushed in the Google Cloud registry\n",
    "- region: Cloud region to train in. See [here](https://cloud.google.com/ml-engine/docs/tensorflow/regions) for supported AI Platform Training Service regions\n",
    "\n",
    "\n",
    "The arguments before `-- \\` are for AI Platform Training Service.\n",
    "The arguments after `-- \\` are sent to our `task.py`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can track your job and view logs using [cloud console](https://console.cloud.google.com/mlengine/jobs)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "PROJECT_ID=$(gcloud config list project --format \"value(core.project)\")\n",
    "BUCKET=$PROJECT_ID\n",
    "REGION=\"us-central1\"\n",
    "\n",
    "# Output directory and jobID\n",
    "OUTDIR=gs://${BUCKET}/taxifare/trained_model_$(date -u +%y%m%d_%H%M%S)\n",
    "JOBID=taxifare_container_$(date -u +%y%m%d_%H%M%S)\n",
    "echo ${OUTDIR} ${REGION} ${JOBID}\n",
    "gcloud storage rm --recursive --continue-on-error ${OUTDIR}\n",    "\n",
    "# Model and training hyperparameters\n",
    "BATCH_SIZE=50\n",
    "NUM_EXAMPLES_TO_TRAIN_ON=100\n",
    "NUM_EVALS=100\n",
    "NBUCKETS=10\n",
    "NNSIZE=\"32 8\"\n",
    "\n",
    "# AI-Platform machines to use for training\n",
    "MACHINE_TYPE=n1-standard-4\n",
    "SCALE_TIER=CUSTOM\n",
    "\n",
    "# GCS paths.\n",
    "GCS_PROJECT_PATH=gs://$BUCKET/taxifare\n",
    "DATA_PATH=$GCS_PROJECT_PATH/data\n",
    "TRAIN_DATA_PATH=$DATA_PATH/taxi-train*\n",
    "EVAL_DATA_PATH=$DATA_PATH/taxi-valid*\n",
    "\n",
    "IMAGE_NAME=taxifare_training_container\n",
    "IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_NAME\n",
    "\n",
    "gcloud beta ai-platform jobs submit training $JOBID \\\n",
    "   --staging-bucket=gs://$BUCKET \\\n",
    "   --region=$REGION \\\n",
    "   --master-image-uri=$IMAGE_URI \\\n",
    "   --master-machine-type=$MACHINE_TYPE \\\n",
    "   --scale-tier=$SCALE_TIER \\\n",
    "  -- \\\n",
    "  --eval_data_path $EVAL_DATA_PATH \\\n",
    "  --output_dir $OUTDIR \\\n",
    "  --train_data_path $TRAIN_DATA_PATH \\\n",
    "  --batch_size $BATCH_SIZE \\\n",
    "  --num_examples_to_train_on $NUM_EXAMPLES_TO_TRAIN_ON \\\n",
    "  --num_evals $NUM_EVALS \\\n",
    "  --nbuckets $NBUCKETS \\\n",
    "  --nnsize $NNSIZE \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Remark:** If you prefer submitting your jobs for training on the AI-platform using the command line, we have written the `./taxifare/scripts/submit.sh` for you (that you can also invoke using `make submit` from within `./taxifare`). As the other scripts, it reads it configuration variables from `./taxifare/scripts/env.sh`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "environment": {
   "name": "tf2-gpu.2-1.m69",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-1:m69"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
